Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robust median to gopher filter #98

Merged
merged 16 commits into from
Jan 30, 2024

Conversation

KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Jan 7, 2024

This pull request adds a robust median function to the gopher filter, which would otherwise fails on empty docs such as "" or " ".

Sorry for the multiple commit wanted to add @TTTTao725 as a co-author as he found the bug.

@KennethEnevoldsen
Copy link
Contributor Author

Updated in accordance with failing tests

0,
self.character_count,
type="median_word_length",
score=self.median_word_length,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given than median_word_length is bool | float, wouldn't this make score potentially a bool? score is supposed to be a float, so we would have to cast back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it starts out as a False, so tried to match the existing pattern. However the median can be undefined (empty list) but multiple value could represent that (np.nan, 0, False). I would probably go for np.nan or None if that is valid?

@soldni
Copy link
Member

soldni commented Jan 15, 2024

tnx!! added a comment, looks good otherwise.

@soldni
Copy link
Member

soldni commented Jan 15, 2024

for the python style error, you should be able to run make style to fix any issue :)

@KennethEnevoldsen
Copy link
Contributor Author

Hi @soldni fixed the style errors!

@KennethEnevoldsen
Copy link
Contributor Author

@soldni it seems like this is awaiting an approval for the tests to be run

@soldni
Copy link
Member

soldni commented Jan 29, 2024

Hey @KennethEnevoldsen! I've made a few more changes:

  • I did find some issues in how black and isort were configured, which was causing issues in test failing on GitHub but not locally. Sorry for the hassle it caused!
  • I changed the median such that it always returns a float.

I've approved this PR, lmk if it looks good to you too before I merge!

@KennethEnevoldsen
Copy link
Contributor Author

Thanks for taking care of that. Everything looks good to me

@soldni soldni merged commit 50abfb7 into allenai:main Jan 30, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants